{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/peremartra/Large-Language-Model-Notebooks-Course/blob/main/3-LangChain/3_3_Data_Analyst_Agent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DPu6Q1vlol-r"
   },
   "source": [
    "<div>\n",
    "    <h1>Large Language Models Projects</a></h1>\n",
    "    <h3>Apply and Implement Strategies for Large Language Models</h3>\n",
    "    <h2>3.3-Create a Data Analyst Assistant using a LLM Agent.</h2>\n",
    "</div>\n",
    "\n",
    "by [Pere Martra](https://www.linkedin.com/in/pere-martra/)\n",
    "________\n",
    "Model: OpenAI\n",
    "\n",
    "Colab Environment: CPU.\n",
    "\n",
    "Keys:\n",
    "* Agents\n",
    "* LLMAgent\n",
    "\n",
    "Related article: [Create Your Own Data Analyst Assistant With Langchain Agents](https://pub.towardsai.net/create-your-own-data-analyst-assistant-with-langchain-agents-722f1cdcdd7e?sk=7add94d79b9ca41ccbbc60828d7f60f1)\n",
    "_______\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "SutfCuwbr0QB"
   },
   "source": [
    "# Create an LLMAgent with LangChain\n",
    "\n",
    "We will create an agent using LangChain and the OpenAI API to analyze data within Excel files.\n",
    "\n",
    "This agent will have the ability to identify relationships between variables, clean the data, select appropriate models, and execute them to generate future predictions.\n",
    "\n",
    "In essence, it will function as a data scientist assistant, streamlining our day-to-day tasks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UYWYQH6burJZ"
   },
   "source": [
    "## Intalling and importing libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "01iDISWNRzd5",
    "outputId": "3cff27c6-0ef2-474d-cdbd-158fc5e3bc99"
   },
   "outputs": [],
   "source": [
    "!pip install -q langchain==0.1.2\n",
    "!pip install -q langchain_experimental==0.0.49\n",
    "!pip install -q langchain-openai==0.0.2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RGX5QZNgy7WW"
   },
   "source": [
    "We use the **os** library to store Environ variables. Like OPENAI_API_KEY.\n",
    "\n",
    "Get you OpenAI API  Key: https://platform.openai.com/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ErpDywmqURto",
    "outputId": "3959bebd-5c4c-4b66-ac0a-9c0bc938ea8d"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API Key: \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DDdr_KjCwkbm"
   },
   "source": [
    "## Loading the Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xP3D13awC5Y6",
    "outputId": "07c3f57e-6c59-43d8-e202-094aeb745d53"
   },
   "outputs": [],
   "source": [
    "from google.colab import drive\n",
    "drive.mount('/content/drive')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "DipEHmiMC_cw",
    "outputId": "d73dfc18-a97d-4f2e-8f0a-934a8d5cf3ed"
   },
   "outputs": [],
   "source": [
    "!pip install kaggle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "v8wQ7oAuDzJx"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "#This directory should contain you kaggle.json file with you key\n",
    "os.environ['KAGGLE_CONFIG_DIR'] = '/content/drive/MyDrive/kaggle'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "HC2kOxJgD264",
    "outputId": "c57c4ba7-4d59-4168-e600-08a31ec26bbc"
   },
   "outputs": [],
   "source": [
    "!kaggle datasets download -d goyaladi/climate-insights-dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NuWeRYwtECbH"
   },
   "outputs": [],
   "source": [
    "import zipfile\n",
    "\n",
    "# Define the path to your zip file\n",
    "file_path = '/content/climate-insights-dataset.zip'\n",
    "with zipfile.ZipFile(file_path, 'r') as zip_ref:\n",
    "    zip_ref.extractall('/content/drive/MyDrive/kaggle')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jJbLzm-rzrZ5"
   },
   "source": [
    "We are using a kaggle Dataset:\n",
    "https://www.kaggle.com/datasets/goyaladi/climate-insights-dataset\n",
    "\n",
    "You can download the CSV. Feel free to use the Dataset you are more interested In, or your own Data.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "49sY2-63Rxzp"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "csv_file='/content/drive/MyDrive/kaggle/climate_change_data.csv'\n",
    "#creating the document with Pandas.\n",
    "document = pd.read_csv(csv_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 347
    },
    "id": "iyJbbQZBUOXm",
    "outputId": "985d6a0c-2156-4532-95ac-8c693ce91348"
   },
   "outputs": [],
   "source": [
    "document.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "kABkxcdEaq7r"
   },
   "outputs": [],
   "source": [
    "#If you want to use your own CSV just execute this Cell\n",
    "#from google.colab import files\n",
    "\n",
    "#def load_csv_file():\n",
    "#  \"\"\"Loads a CSV file into a Pandas dataframe.\"\"\"\n",
    "\n",
    "#  uploaded_file = files.upload()\n",
    "#  file_path = next(iter(uploaded_file))\n",
    "#  document = pd.read_csv(file_path)\n",
    "#  return document\n",
    "\n",
    "#if __name__ == \"__main__\":\n",
    "#  document = load_csv_file()\n",
    "#  print(document)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VduNnDSMnILu"
   },
   "source": [
    "# Creating the Agent\n",
    "This is the easiest Agent we can create with LangChain, we only need to import the **create_pandas_dataframe_agent**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bCGv4U6p1mcC"
   },
   "source": [
    "Time to create our little assistant, and we need only a call.\n",
    "\n",
    "We let **OpenAI** decide which model to use. However, we specify a **temperature** value of 0 to its parameter, so that it is not imaginative. This is much better when we want the model to be able to give commands to the different libraries it can use.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "E9uDnPbWXCwb"
   },
   "outputs": [],
   "source": [
    "from langchain.agents.agent_types import AgentType\n",
    "from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent\n",
    "\n",
    "from langchain_openai import ChatOpenAI\n",
    "from langchain_openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "VyTyzCXDTrGU"
   },
   "outputs": [],
   "source": [
    "sm_ds_OAI = create_pandas_dataframe_agent(\n",
    "    OpenAI(temperature=0),\n",
    "    document,\n",
    "    verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "R7LkZ8jOG_Cz"
   },
   "outputs": [],
   "source": [
    "sm_ds_Chat = create_pandas_dataframe_agent(\n",
    "    ChatOpenAI(temperature=0),\n",
    "    document,\n",
    "    handle_parsing_errors=True,\n",
    "    verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "asS_GhDCa2lL"
   },
   "source": [
    "We are going to test 2 different models. The recommendation is use the one created withg the class OpenAI, but judge it by yourself."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "25lfRzgDbS8S"
   },
   "source": [
    "## First Question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "fTagQKYzUWF0",
    "outputId": "9b5c952a-2d5b-4a5f-d145-61fb46a6a800"
   },
   "outputs": [],
   "source": [
    "sm_ds_OAI.invoke(\"Analyze this data, and write a brief explanation around 100 words.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "9EiLTOo2cB79",
    "outputId": "03412ae8-cb07-4fac-94a3-7410b2a90969"
   },
   "outputs": [],
   "source": [
    "document.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "M6O3TZX7bB4j"
   },
   "source": [
    "The description of the data made by the Agent is acurated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Ftxg9UwVY4w1",
    "outputId": "fa1bbb4b-ace0-49ad-d353-023668112bd1"
   },
   "outputs": [],
   "source": [
    "sm_ds_Chat.invoke(\"\"\"Analyze this data, and write a brief explanation around 100 words. \"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Cw1tU1OIbPrH"
   },
   "source": [
    "The second Agent is unable to solve this question.\n",
    "\n",
    "## Second Question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "id": "wCZaMyRuUfK7",
    "outputId": "9ec690d2-dc6e-4701-80dc-671c5bfec7ee"
   },
   "outputs": [],
   "source": [
    "sm_ds_OAI.run(\"Do you think is possible to forecast the temperature?\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "80lvpAWpb8Ho"
   },
   "source": [
    "The model thinks that is possible to forecast the temperature, but difficult because the weak correlation between variables.\n",
    "\n",
    "I don't now why the model decided to create a graphic bar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 628
    },
    "id": "ce4mvng6abJG",
    "outputId": "c5700b24-55ed-43b5-c834-42821d32a577"
   },
   "outputs": [],
   "source": [
    "sm_ds_Chat.run(\"Do you think is possible to forecast the temperature?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "R5-Dlmy9caEZ"
   },
   "source": [
    "The second Agent, created with ChatOpenAI has a different opinion."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TOt76k0EM1yh"
   },
   "source": [
    "## Third question"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "id": "UvE5a5N_yNxq",
    "outputId": "82e5b545-e026-4257-8e95-2cc075b3a161"
   },
   "outputs": [],
   "source": [
    "sm_ds_OAI.run(\"\"\"\n",
    "Can you create a line graph containing the anual average co2 emissions over the years?\n",
    "\"\"\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rDUvtWEScyfy"
   },
   "source": [
    "## Fourth question"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "id": "R_9Lp3JZfHGn",
    "outputId": "cf7f6881-b556-4f60-a592-ed619c5aec8c"
   },
   "outputs": [],
   "source": [
    "sm_ds_OAI.run(\"\"\"\n",
    "Create a line graph with seaborn containing the anual average co2 emissions in Portugal over the years.\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xgcnGNkQdbax"
   },
   "source": [
    "## Last Question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "uYtojJhberB4",
    "outputId": "d1612387-b46d-4d36-daef-23ac71d8edcc"
   },
   "outputs": [],
   "source": [
    "sm_ds_OAI.invoke(\"\"\"Select a forecasting model to forecast the temperature.\n",
    "Use this kind of model to forecast the average temperature\n",
    "for year in Port Maryberg in Malta for the next 5 years.\n",
    "Write the temperatures forecasted in a table.\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "o_o-i2BNXBe6"
   },
   "source": [
    "# Conclusions\n",
    "\n",
    "This is one of the most powerful and, at the same time, easiest to use agents. We have seen how with just a few lines of code, we had an agent capable of following our instructions to analyze, clean, and generate charts from our data. Not only that, but it has also been able to draw conclusions and even decide which algorithm was best for forecasting the data.\n",
    "\n",
    "The world of agents is just beginning, and many players are entering the field, such as Hugging Face, Microsoft, or Google. Their capabilities are not only growing with new tools but also with new language models.\n",
    "\n",
    "**It's a revolution that we cannot afford to miss and will change many things.**"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "gpuType": "T4",
   "include_colab_link": true,
   "machine_shape": "hm",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
